Lecture 4

Data Input: process of encoding data and placing it in a database (G in G out).

Two aspects of data:

positional or geographic data necessary to define where object or cartographic features occur.
attributes that record what the cartographic feature represents.

Four steps to data input:

entering spatial data

no single method, governed by budget, types of data, data structure of GIS.
types of data encountered

existing maps.
aerial photographs.
remote sensing data (digital)
point sample data.
survey data (censuses).

manual input to a vector system, type in coordinates of points, lines, areas.
manual input into a raster system

points, lines, areas as sets of cells
choose grid size (raster)
transparent grid layer over maps.
value of a single map attribute is written down and entered.
develop # codes for attributes.
designation of areas outside of mapping region.
check data after entered in text file.
procedural problems:

accuracy of representation is directly dependent on grid size.
data volume inversely proportional to square of grid size

R-L Codes: input data as run-length codes many adjacent cells have similar values especially in thematic (choropleth) maps.

digitizing

types: electrical - orthogonal fine wire grid. Characterisitics: should include stability, repeatability, linearity, resolution and skew.
point digitizing: mouse used to record points, points are digitized by placing the cross hairs over the point one wishes to digitize and pressing the button.
stream digitizing: points are recorded at a fixed rate. strait lines usually have few points recorded. Complex areas more points, need an experienced digitizer or else need good thinning algorithm.
coordinate transformation: digitize and input ground coordinates for lower left and upper right of map for scale, rotation and translation

translation
E' = X + TX
N' = Y + TY
scale
E' = XSX
N' = YSY
rotation
E' = Xcosų - Ysinų
N' = Xsinų + Ycosų

Combination of three which relates digitizing tablet coordinates to Ground Reference System and includes redundent observations (>= 4 points).

E = a1 + a2X + a3Y + vX
N = b1 + b2X + b3Y + vY

RMS error - (Residual Mean Square error) a measure of tic registration accuracy during digitizating and coverage transformation.

RMS = (Ev²/(n-1))^1/2

Coverage tolerances (Arc/Info terminology)

Fuzzy tolerance - minimum distance between all (nodes and vertices) arc coordinates. Typical minimum distance between coordinates .002 inches. Handles small overshoots or undershoots, automatic sliver removal, and coordinate thinning of arcs. Exercise caution when using the fuzzy tolerance (i.e. how large is too large?). .
Dangle length - minimum distance between arc coordinates. A dangling arc has the same polygon on its left and right side.
Node match tolerance - minimum distance between node features. In ADS called snap distance.
Weed tolerance - distance between coordinates (vertices) within each arc (Arg. in ADS).

Coverage TOL files contain the values for the coverage's fuzzy tolerance and dangle length. These values are important because they help define the coverages resolution.

vector to raster conversion

most data input for raster systems is done by digitization if the data is in map form.
vector to raster conversion algorithms enable data input as vectors or polygons to be converted to raster grid cells.
digitization for raster based GIS is usually done using polygon digitization not chain digitization as is the norm with vector based GIS packages.
relative loss of information when converting from vector to raster as a function of grid size.
automated scanning

raster scan
uses CCD arrays to record information in binary form. A CCD is a semi-conductor that translates photons into counts of electrons. Raster to vector conversion need for vector GIS, thinning and weeding of lines necessary.

satellite data
already in raster form for raster GIS, needs raster to vector conversion for vector GIS.

entering non-spatial associated attributes

spatial

road coordinates
tree-stand coordinates

attributes

speed limit, surface type, political control, etc.
species, size class, density, site index, accessibility, soil type, etc.

Linking spatial and non-spatial data

attach feature codes to graphic entities.
attach unique identifiers then enter no-spatial data afterwards.

Data Validation

errors encoding spatial and non-spatial data

spatial data incomplete.
spatial data in wrong place.
spatial data at wrong scale.
spatial data is distorted.
spatial data in different coordinate system.
topology of spatial data is incomplete
spatial data linked to wrong non-spatial data.
non-spatial data incomplete.

data verification

spatial: compare plotted data to original.
non-spatial: print out columns and check.

Data input processes

Digitization

Prepare the map sheet for digitizing.
Questions:
- Do all lines connect and polygons close?
- Is there one label point with a unique User-ID in each polygon?
- Visually edge match adjacent sheets.
- Does map have tics for control?

Digitize the coverage.

Digitize as arcs with nodes as endpoints and vertices between the arcs that describe the cartographic detail of the arc. All are captured as a series of x,y coordinates. Points, lines, and polygons can be digitized. In Arc/Info BUILD is used to create a point attribute table, BUILD or CLEAN are used to create the Arc Attribute Table for a line coverage and BUILD or CLEAN are used to create Polygon Attribute Tables.

Identify and correct digitizing errors.
Questions:
- Arcs accurately traced (overlay maps)?
- Arcs or label points missing?
- Do nodes match properly (pseudo nodes or dangling nodes)?

Define features and build topology.

Performed after all digitizing errors have been repaired. Feature topology and minimal feature attribute tables should be created using CLEAN and BUILD.

Identify and correct topology errors.

pseudo nodes, dangling nodes (over and under shoots), wierd polygons, slivers Label errors, more than one or none.

Assign attributes to coverage features.

Link attribute data using common User-ID in both the minimal attribute tables and the other attribute files associated with them. Uses a relational join to link the files.

Identify and correct attribute coding errors.

Conflation and Rubber Sheeting

For different coverages or objects to be analyzed simultaneously it is necessary that they be in the same coordinate system (same projection and datum). At times however they be in the same coordinate system but may not overlay properly (the same features occur at different locations). This may be the result of inaccuracies in the compilation of one of the object types or coverages. When this is the case it is common to conflate the less accurate coverage to the more accurate coverage. Two approaches are possible: a global and local operation. A global approach can eliminate systematic geometric errors through a coordinate transformation , which corrects for linear scale change, rotation and translation in first order and warping in higher orders. To correct for non-systematic errors a process of rubber-sheeting is used in which many contral points are spread throughout the model and corrections are made based on local differences between the control points on the more accurate coverage and the same points on the less accurate coverage. The change in the old coordinates will be a weighted sum of the difference between the delta (change) in x and y of the control points.